This tutorial covers:
Downloading flow data from USGS gauges
Basic plotting of hydrograph data
“Wrangling” the data into a more flexible format
Calculating basic hydrograph metrics
Flow Duration Curves
Annual Flood Frequency
Empirical return intervals
You will need to download the:
* dataRetrieval and tidyverse packages
* You only need to do this once per machine
* copy the following code into your console and run it (press
enter)
* Do NOT put this code into your script
install.packages("dataRetrieval")
install.packages("tidyverse")
Now that we have the packages installed, we can start a new script (white square with a green plus sign icon in the top left) and begin our analysis.
library(dataRetrieval)
library(tidyverse)
We will use the dataRetrieval package to download
data
For a more complete tutorial on the dataRetrival
package, see this
website
Here, we will download data from the USGS gauge on the Colorado River near the UT-CO stateline
Save variables to make download easier
siteNo <- "USGS-09163500" #CO River near CO-UT stateline
# you will need to look up a site code for a different gauge for your assignment
pCode <- "00060" #discharge
# all daily data from 2003-2023
start.date <- "2003-10-01"
end.date <- "2023-12-31"
# For your assignment, make sure there is at least 10 yers of data, and change the dates as appropriate
read_waterdata_daily() function to download
the data, and save it as an object called co_rco_r <- read_waterdata_daily(
monitoring_location_id = siteNo,
parameter_code = pCode,
time = c(start.date, end.date))
## Requesting:
## https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily/items?f=json&lang=en-US&limit=10000&monitoring_location_id=USGS-09163500¶meter_code=00060&time=2003-10-01%2F2023-12-31
# just first few rows:
co_r
# statistic_id == 00003 = daily
View() the whole thing with this
code:View(co_r)
Make a plot with time in the x-axis and
value on the y-axis
time is actually the datevalue is CFS (we will change the name of the column
later for simplicity)Include informative title
add units to y-axis label
ggplot(co_r,
aes(x = time,
y = value)) +
geom_line() +
labs(title= "Streamflow for the Colorado River",
subtitle = "Gage 09163500 near the CO-UT border",
y = "Streamflow (CFS)") +
theme_bw()
* Qualitatively describe this flow pattern
* Consider the:
* Magnitude
* Frequency
* Duration
* Timing
* Rate of change
Your data may have too many years to easily view the flow regime
We will create a y column to indicate the
year
Then we will filter a single year value to plot
that
co_r |>
mutate(y = year(time)) |>
filter(y == 2015) |> # Make sure you chnage this to match one year which occurs in your data
ggplot(
aes(x = time,
y = value)) +
geom_line() +
labs(title= "Streamflow for the Colorado River in 2015", # make sure year in title matches
subtitle = "Gage 09163500 near the CO-UT border",
y = "Streamflow (CFS)") +
theme_bw()
time : In this case, the date in YYYY-MM-DD
formatvalue which is the mean discharge (Q) for that day
qunit_of_measure column, bit it’s
always CFS, so we will just have to remember that# remove "geometry" attributes
attr(co_r, "class") <- "data.frame"
# overwrite co_r to just be the two columns
# and rename value column to be "q"
co_r <- co_r |>
select(time, value)|>
rename(q = value)
co_r
Recall that a FDC is calculated as:
\[ \text{% Exceedance} = \frac{m}{N+1}*100 \]
where \(m\) is the rank of the flow, and \(N\) is the number of observations (number of days) in the data
daily_pe <- co_r |>
# sort by the q column
arrange(-q) |>
# add a "rank" column
mutate(m_rank = 1:n()) |>
# calculate % exceedance
mutate(PE = (m_rank / (n() +1)) *100)
# print out the new data object
daily_pe
ggplot(daily_pe,
aes(x = PE,
y = q)) +
geom_line()
* FDC’s are usually plotted with a logarithmic y- and x-axes
ggplot(daily_pe,
aes(x = PE,
y = q)) +
geom_line() +
scale_y_log10() +
scale_x_log10()
* we can also add a title and axis labels for clarity
ggplot(daily_pe,
aes(x = PE,
y = q)) +
geom_line() +
scale_y_log10() +
scale_x_log10() +
labs(title = "Daily Flow Duration Curve",
subtitle = "CO River near CO-UT stateline",
y = "Q (CFS)",
x = "Percent Exceedance")
daily_pe object to see what flow equates
to a specific percent exceedance# 10% exceedance?
daily_pe |>
filter(PE == 10)
daily_pe |>
filter(PE >= 9.9,
PE <= 10.1)
filter comanddaily_pe |>
filter(PE > 9.99,
PE < 10.01)
so, the flow in the Colorado river exceeded or met 11400 CFS only
10% of the time
In other words, a flow of 11400 CFS was relatively rare
What are common flow values?
daily_pe |>
filter(PE > 74.99,
PE < 75.01)
co_r data
set\[ R_{i} = \frac{m+1}{N} \] * Where \(R_i\) is the recurrence interval (in years)
\(m\) is rank
\(N\) is the number of years in the data set
Our data right now shows the mean flow value for every day across
our timeframe
for the AMS-FDC, we want the single largest value for every year
in the dataset
In other words, we need to group the data by year and only keep the maximum value
Currently, co_r has a time column which
has the date information in it
There are many methods for sorting and grouping data which is in
a date-format (YYYY-MM-DD)
The method we are going to use is to make separate columns for
year month and day (they will be named y, m,
and d)
We can do this using handy functions that are a part of the
tidyverse package we already loaded
y, m, and d columnsmutate() function to add new
columnsmutate() adds a column but does not change the number
of rowsco_r |>
mutate(y = year(time))
y
for the yeartime
column, and extracting the value for year in that columntime column was
already in a date-formatco_r # note the <date> notation underneath time
# also note the <dbl> notation under Q. dbl is short for "double" which means it is a number format whic can contain decimals
co_r |>
pull(time) |>
class() # the output here says "Date"
## [1] "Date"
year() function and it doesn’t work, it
most likely means the data is in the wrong format.
month() and day() function
which will extract the relevant date information# overwrite co_r to have the date columns
co_r <- co_r |>
mutate(y = year(time),
m = month(time),
d = day(time))
# print out our new co_r to see what it looks like
co_r
time column so that we have the date
information in case we need it latery
column)co_r |>
group_by(y)
Note that the data looks the same, but there is now a
Groups: y[21] line at the top of the data output
this is saying that the data now contains groups based on values
in the y column, and there are 21 unique
groups
Now we summarize() the data to keep the single
largest value per group/year
The summarize() function reduces the data to have a
single row per group
# ams is short for annual maximum series
ams <- co_r |>
group_by(y) |>
summarize(high_flow = max(q))
# print out the new object
ams
ams object now has 21 rows, one per group/yearams_ri <- ams |>
# sort by the high_flow column
arrange(-high_flow) |>
# add a "rank" column
mutate(m_rank = 1:n()) |>
# calculate the return interval, and the probability
mutate(Ri = (n() + 1) / m_rank,
pr = 1 / Ri)
# print out the new data object
ams_ri
The annual bankfull flow generally has a return interval between 1.1 and 2 years
What is the bankfull flow for the Colorado River at this station?
ams_ri |>
filter(Ri <= 2,
Ri >= 1.1)
This returns a broad range of values: from 5,090-23,500 CFS
Let’s say the bankfull flow here has a return interval of 1.5 years.
We can see above that we don’t have that exact return interval, but we can filter values close to that:
ams_ri |>
filter(Ri <= 1.6,
Ri >= 1.4)
What flow approximately represents the 10-year flood?
Recall that it is unlikely there is exactly a return interval of 10, so use bounds to capture the location
ams_ri |>
filter(Ri <= 15,
Ri >= 7)
ams_ri |>
filter(Ri <= 100,
Ri >= 40)
This means that there is no data with an \(R_i\) between 40 and 100
Let’s see what the maximum \(R_i\) value in our data is
ams_ri |>
filter(Ri == max(Ri))
In this data, the biggest return interval is 22 years, and has a flow of 46800 CFS
For this data, all we can say about the 50-year flood is that it is greater than 46800
Note that there is a method for extrapolating return intervals called the Log III Pearson regression
Unfortunately, this analysis is beyond the scope of this course, but I want you to be aware of its existence
Note that the data you find for the assignment may be more extensive than this, so you may be able to report more accurate estimates for floods with higher return intervals
\[ \text{Risk} = 1 - (1 - P)^n \] * What’s the risk of having a flood with a 0.15 probability in the next 2 years?
\(P = 0.15\) and \(n = 2\)
We can use R as a fancy calculator and plug these numbers in
1 - (1 - 0.15)^2
## [1] 0.2775
So there is a 0.2775 or 27.75% chance of having that flood occur in the next two years
What’s the risk of having a flood with a return interval of 50 years in the next 10 years?
here \(n = 10\) but we have to calculate \(P\)
\[ P = 1 / T \] * Where \(T\) is the return interval in years
1 - (1 - (1/50))^10
## [1] 0.1829272